Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement coalesced pooling over entire batches #368

Merged
merged 2 commits into from
Mar 13, 2023

Conversation

shadeMe
Copy link
Collaborator

@shadeMe shadeMe commented Jan 27, 2023

Description

This PR ports a feature from curated-transformers that applies the pooling operation (one that's performed on the piece representations to get the token-level representations) on entire batches instead of individual Docs. This significantly reduces the overhead of launching the custom kernel behind the scenes, especially in high-throughput scenarios like inference.

This change improves the GPU inference performance of the German transformer model (minus the trainable lemmatizer) by 32.5% (20171.7 WPS -> 26725.9 WPS). GPU training speed also sees a modest improvement of 4.6% (3547.7 WPS -> 3713 WPS).

Types of change

Feature

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@shadeMe shadeMe added enhancement New feature or request perf / speed Performance: speed feat / pipeline Feature: Pipeline components labels Jan 27, 2023
@shadeMe shadeMe marked this pull request as ready for review January 30, 2023 18:07
@shadeMe
Copy link
Collaborator Author

shadeMe commented Jan 30, 2023

Rebased on master (sorry about the force push).

@danieldk
Copy link
Contributor

danieldk commented Feb 2, 2023

This breaks backward compat with earlier models (see CI). @adrianeboyd is that something that is acceptable? If not, the model version should probably be bumped? (Or the concat should be implemented as standalone functions that are used in trf2arrays, but that would be less elegant.)

@adrianeboyd
Copy link
Contributor

I think you'll need a bunch of new architectures. At first glance it feels like this should be in parallel to the existing trfs2arrays instead of modifying it? And it feels like there must be some low-level tests missing, because I'm surprised that at the very least the type checking didn't break for all the Tok2VecTransformer versions.

@shadeMe
Copy link
Collaborator Author

shadeMe commented Feb 2, 2023

On second thought, I'm kinda leaning towards using standalone functions for the concatenation as that'll let existing models/configs benefit from the change and will touch less code. Apart from the implementation being more unwieldy, are there any downsides to it?

At first glance it feels like this should be in parallel to the existing trfs2arrays instead of modifying it?

Hmm, what would be the benefit of having a separate version of trf2arrays just for the pooling behaviour? As I see it, it's more an implementation detail than a parameter that we'd want users to care about. One alternative would be to refactor trf2arrays and let it be chained with the pooler, but that has the same issue of breaking compatibility with existing models.

@adrianeboyd
Copy link
Contributor

You have to keep the existing trfs2arrays for the legacy versions of a bunch of architectures.

@shadeMe
Copy link
Collaborator Author

shadeMe commented Feb 3, 2023

Reimplemented with standalone functions to preserve backward compatibility (tested with models from spaCy v3.4.0 and v3.5.0).

@danieldk danieldk merged commit 3965a78 into explosion:master Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feat / pipeline Feature: Pipeline components perf / speed Performance: speed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants